by Daria Yuferova
This project is based on Data Analyst Job Postings [Pay, Skills, Benefits] dataset by Luke Barousse and his analysis (see license).
In his data collection process Luke uses SerpAPI to pull job postings from Google's search results for Data Analyst positions in the United States. Data collection started on November 4th, 2022, and the results are updated daily - check it out, as well as his wonderful YouTube channel! Thank you Luke for everything you're doing for Data Analytics community! 🙌
When examining Luke's results, I was surprised to see SQL (not Python!) at the very top of the skills for Data Analysts. So it only made sense for me to try and process this data using the employers' favorite tool!
The purpose of this project is mainly to practice my SQL skills, thus, this notebook will only cover the process of analysis using this language. For the prerequisite Python code used you can refer to my Github repository. Again, the code is mostly Luke's intellectual property, I just tweaked a couple of things and exported a CSV file with a few columns I needed.
Prepare all the necessary packages to work with SQL (and some Python) in Jupyter Notebook:
# pip install mysqlclient # to run SQL queries inside Jupyter Notebook with %%sql magic ✨
# %pip install jupysql --upgrade # to transform SQL query results into pandas dataframes
%load_ext sql
# %sql mysql:// # you need to enter your credentials here to connect to your SQL database
import MySQLdb
import pandas as pd
import plotly.express as px
Since we have job postings data in CSV file format (exported from the Python script), we need to import the data into an SQL table to start analyzing it with SQL.
First we create a template for our future table specifying expected columns and datatypes:
%%sql
CREATE TABLE IF NOT EXISTS jobs (
title VARCHAR(255),
company_name VARCHAR(255),
location VARCHAR(255),
via VARCHAR(255),
date_time DATETIME,
salary_standardized VARCHAR(255),
description_tokens TEXT
);
Then we load the CSV file into a new table:
%%sql
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/jobs_sql.csv'
INTO TABLE jobs
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS;
We will need to transform the table for our needs: sort it by time of job posting, add an ID column, handle salary and description tokens columns.
First sort the table chronologically:
%%sql
ALTER TABLE jobs
ORDER BY date_time ASC;
Add an ID column based on chronological order:
%%sql
ALTER TABLE jobs
ADD ID INT AUTO_INCREMENT PRIMARY KEY FIRST;
Change the salary_standardized column data type to float, paying attention to null values (reminder - as of now we have salary_standardized column as string and empty strings cannot be directly converted to float).
%%sql
UPDATE jobs
SET salary_standardized = null
WHERE salary_standardized = '';
%%sql
ALTER TABLE jobs
MODIFY salary_standardized FLOAT(53);
Let's also prepare the skills (description_tokens) column for future analysis by removing blank spaces:
%%sql
UPDATE jobs
SET description_tokens = REPLACE(description_tokens, ', ', ',');
Check the results:
%%sql
DESCRIBE jobs;
| Field | Type | Null | Key | Default | Extra |
|---|---|---|---|---|---|
| ID | int | NO | PRI | None | auto_increment |
| title | varchar(255) | YES | None | ||
| company_name | varchar(255) | YES | None | ||
| location | varchar(255) | YES | None | ||
| via | varchar(255) | YES | None | ||
| date_time | datetime | YES | None | ||
| salary_standardized | double | YES | None | ||
| description_tokens | text | YES | None |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()%%sql
SELECT *
FROM jobs
LIMIT 5;
| ID | title | company_name | location | via | date_time | salary_standardized | description_tokens |
|---|---|---|---|---|---|---|---|
| 1 | DATA ANALYST w/INFORMATICA EXPERIENCE | Business Advisory Solutions Group (BASG) | United States | via Big Country Jobs | 2022-11-04 03:40:11 | None | integrity |
| 2 | Data Analyst II | EDWARD JONES | Bates City, MO | via My ArkLaMiss Jobs | 2022-11-04 03:40:11 | 103781.0 | power bi,sas,critical thinking,r,excel,sql,decision making,presentation skills,spss,python,snowflake |
| 3 | Data Analyst III - Full Time - Days | Mercy | Springfield, MO | via Careers At Mercy | 2022-11-04 03:40:11 | None | communication,go,excel,word,outlook,powerpoint |
| 4 | Sr. Business Data Analyst | New American Funding | United States | via KSNT Jobs | 2022-11-04 03:40:11 | 100000.0 | communication,power bi,tableau,ssrs,sql,snowflake,ssis |
| 5 | Sr. Data Analyst | Medasource | United States | via KSNT Jobs | 2022-11-04 03:40:11 | 110000.0 | communication,azure,interpersonal skills,excel,sql,integrity,problem solving |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()Everything is in order!
Basic data exploration was previously made (see Python code) but we will also explore the data with SQL.
Let's check how many job postings include salary data:
%%sql
SELECT COUNT(*) AS n_of_postings, COUNT(DISTINCT salary_standardized) AS salary_data,
COUNT(DISTINCT salary_standardized) / COUNT(*) AS salary_data_share
FROM jobs ;
| n_of_postings | salary_data | salary_data_share |
|---|---|---|
| 23745 | 542 | 0.0228 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()Less than 3% of all recorded job postings come with salary information - not too much for a set of over 20k job postings.
Check the limits of salary data:
%%sql
SELECT MIN(salary_standardized), MAX(salary_standardized)
FROM jobs;
| MIN(salary_standardized) | MAX(salary_standardized) |
|---|---|
| 18720.0 | 624000.0 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()The main reason for such ~wild~ wide range of values is that our dataset includes not only full-time jobs, but also part-time, freelance, project positions - where the compensation offered rarely scales correctly for annual format.
Let's find top 10 most popular job titles:
%%sql
SELECT title, COUNT(*) AS freq
FROM jobs
GROUP BY title
ORDER BY freq DESC
LIMIT 10;
| title | freq |
|---|---|
| Data Analyst | 3017 |
| Senior Data Analyst | 830 |
| Data Analyst II | 385 |
| Lead Data Analyst | 304 |
| Sr. Data Analyst, Marketing Operations | 271 |
| Business Data Analyst | 251 |
| Data Scientist | 203 |
| Data Analyst III | 180 |
| Sr. Data Analyst | 130 |
| Marketing Data Analyst | 126 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()Interesting. Out TOP 10 is mostly presented by Data Analysts of different levels and domains.
But are they also the TOP 10 job titles with best compensation?
%%sql
SELECT title, ROUND(AVG(salary_standardized),1) AS avg_salary
FROM jobs
GROUP BY title
ORDER BY avg_salary DESC
LIMIT 10;
| title | avg_salary |
|---|---|
| Freelance Data Analyst | 624000.0 |
| Consulting from Data buyer | 572000.0 |
| Looking for a EXPERT Equity Research Analyst - Contract to Hire | 468000.0 |
| Data Analyst for Ecommerce and Marketing Agency | 322400.0 |
| Analyze census data | 312000.0 |
| Cluster Analysis Expert | 312000.0 |
| Designing an Excel Scorecard for our Company | 286000.0 |
| Amazon PPC Data Analyst | 260000.0 |
| PLS, ensamble models, decision tree algorithms | 260000.0 |
| Market research data analyst - Contract to Hire | 260000.0 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()Again, we see unusually high salaries and unusual titles. Let's make sure we're not looking at the freelance postings coming via Upwork, as they tend to be recorded with poorly scalable hourly/by project salary data:
%%sql
SELECT title, via, ROUND(AVG(salary_standardized),1) AS avg_salary
FROM jobs
WHERE via != 'via Upwork'
GROUP BY title, via
ORDER BY avg_salary DESC
LIMIT 10;
| title | via | avg_salary |
|---|---|---|
| Principal Data Engineer/Architect | via LinkedIn | 260000.0 |
| SAP S4 HANA BRIM Analyst | via LinkedIn | 236683.2 |
| Clinical Mapping Data Analyst | via LinkedIn | 235040.0 |
| Contract Data Scientist/Engineer - $125 p/h | via LinkedIn | 234000.0 |
| Data Scientist, Analytics | via Ai-Jobs.net | 233500.0 |
| Lead Cybersecurity Data Analyst | via Indeed | 232500.0 |
| Founding Data Scientist | via LinkedIn | 225000.0 |
| Director Data Scientist - Commercial Platforms (P2474). | via Ai-Jobs.net | 221875.0 |
| Data Analytics & Engineering - Data Analyst V Data Analyst V | via Mindlance - Talentify | 220480.0 |
| Data Engineer (Hybrid) | via Ai-Jobs.net | 217500.0 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()Now we encounter more Data Science and Data Engineering positions on top!
We see very very different and personalized titles used, and it's hard to draw any conclusions on which areas of Data Analytics are paid for the most.
Let's create wider title groups:
%%sql
SELECT
CASE
WHEN title LIKE '%senior%' OR title LIKE '%lead%' OR title LIKE '%sr%'
OR title LIKE '%analyst IV%' THEN 'Senior Data Analyst'
WHEN title LIKE '%mid data analyst%' OR title LIKE '%mid%'
OR title LIKE '%analyst II%' THEN 'Middle Data Analyst'
WHEN title LIKE '%junior data analyst%' OR title LIKE '%analyst I%' THEN 'Junior Data Analyst'
WHEN title LIKE '%data engineer%' THEN 'Data Engineer'
WHEN title LIKE '%data scientist%' THEN 'Data Scientist'
WHEN title LIKE '%data analyst%' THEN 'Data Analyst'
WHEN title LIKE '%analyst%' THEN 'Analyst'
ELSE 'Other'
END AS title_cat, ROUND(AVG(salary_standardized),1) AS avg_salary
FROM jobs
GROUP BY title_cat
ORDER BY avg_salary DESC;
| title_cat | avg_salary |
|---|---|
| Data Engineer | 129481.3 |
| Data Scientist | 125937.8 |
| Senior Data Analyst | 108351.7 |
| Analyst | 101062.2 |
| Middle Data Analyst | 92079.6 |
| Data Analyst | 89800.9 |
| Other | 89785.5 |
| Junior Data Analyst | 58249.9 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()Again, Data Engineers and Data Scientist are more lucky with generous offers. This data would be most interesting for Junior Data Analysts entering the field, as they can see the average reference level of salaries mentioned in the US job postings over the last 8 months.
For further analysis it would be interesting to split the dataset by time periods and check if there is any trend in the average salary over time, provided we have the data covering a longer period.
Next up - companies: who pays more VS who offers more jobs?
%%sql
SELECT company_name, ROUND(AVG(salary_standardized),1) AS avg_salary, COUNT(*) AS n_of_postings
FROM jobs
GROUP BY company_name
ORDER BY n_of_postings DESC -- or ORDER BY avg_salary DESC
LIMIT 10;
| company_name | avg_salary | n_of_postings |
|---|---|---|
| Upwork | 87550.2 | 3864 |
| EDWARD JONES | 122218.9 | 727 |
| Walmart | 110500.0 | 703 |
| Corporate | 58240.0 | 606 |
| Cox Communications | 97068.9 | 510 |
| Talentify.io | None | 414 |
| Dice | None | 258 |
| Insight Global | 101845.2 | 214 |
| Staffigo Technical Services, LLC | None | 167 |
| Centene Corporation | None | 117 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()Most job offers come from the freelance platform Upwork or big players like Walmart. Even among the most active companies, there are those that do not reveal any salary data in their job postings.
Now that we have gotten familiar with the data presented, let's look into the skills mentioned the most in the job posting descriptions and compensation offered for the said skills.
We will also visualize the findings using Plotly - interactive graphic library for Python.
Visualizing SQL queries is not a trivial task, but the JupySQL tool makes it possible by linking the results of SQL queries with a Pandas (or Polars) dataframe. I find it very inspiring to work with cross-language tools that make life easier for data analysts!
Iterating through lists and counting instances using Python's list comprehension is easy and elegant.
In SQL, however, I managed to do this by creating a skills table (a poor substitute for normalization, but alas) and using the FIND_IN_SET() function, which returns position of a string in a list, to join the tables (for this reason we needed to remove all spaces in the description_tokens column, as the function requires a comma-separated list only).
First, we create a new table in our database and fill it with the skills of my choice (that I am interested to analyze) and skill category - hard or soft:
%%sql
CREATE TABLE IF NOT EXISTS skills (
ID INT AUTO_INCREMENT PRIMARY KEY,
skill VARCHAR(255),
type VARCHAR(255)
);
%%sql
INSERT INTO skills (skill, type)
VALUES
('python', 'hard skill'), ('sql', 'hard skill'), ('r', 'hard skill'), ('power bi', 'hard skill'),
('tableau', 'hard skill'), ('github', 'hard skill'), ('machine learning', 'hard skill'), ('aws', 'hard skill'),
('data lake', 'hard skill'), ('excel', 'hard skill'), ('spss', 'hard skill'), ('airflow', 'hard skill'),
('jupyter', 'hard skill'), ('azure', 'hard skill'), ('bigquery', 'hard skill'),
('communication', 'soft skill'), ('flexibility', 'soft skill'), ('initiative', 'soft skill'),
('integrity', 'soft skill'), ('critical thinking', 'soft skill'), ('decision making', 'soft skill'),
('problem solving', 'soft skill'), ('analytical thinking', 'soft skill'), ('strategic thinking', 'soft skill'),
('presentation skills', 'soft skill')
We save the SQL query calculating the number of mentions and average salary in those mentions per skill and convert it to Pandas dataframe for plotting:
%%sql --save skills_pay
SELECT s.skill, s.type, COUNT(*) AS n_of_postings, ROUND(AVG(j.salary_standardized),0) AS avg_salary
FROM jobs AS j
INNER JOIN skills AS s
ON FIND_IN_SET(s.skill, j.description_tokens)
GROUP BY s.skill, s.type
ORDER BY n_of_postings DESC;
| skill | type | n_of_postings | avg_salary |
|---|---|---|---|
| sql | hard skill | 12117 | 100755.0 |
| communication | soft skill | 9832 | 97872.0 |
| excel | hard skill | 8197 | 89336.0 |
| python | hard skill | 6637 | 106027.0 |
| tableau | hard skill | 6612 | 99766.0 |
| power bi | hard skill | 6104 | 98281.0 |
| r | hard skill | 4248 | 102358.0 |
| integrity | soft skill | 3017 | 95300.0 |
| flexibility | soft skill | 2246 | 105609.0 |
| decision making | soft skill | 1984 | 102368.0 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()# save SQL query as CTE:
result = %sql SELECT * FROM skills_pay
# save CTE as dataframe:
df = result.DataFrame()
Now, let's plot the results for hard and soft skills. The graph is interactive - try hovering over the bars:
fig = px.bar(df, x='skill', y='n_of_postings', color='avg_salary',
facet_row='type', width=800, height=600,
hover_data={'avg_salary':':$,', 'n_of_postings':':,'},
labels={'n_of_postings': 'job postings',
'avg_salary' : 'average salary'},
title='Popularity and Worth of Skills in Data Analyst Job Postings')
fig.show()
We can see that the most frequently mentioned technical skill for Data Analysts is SQL (still going strong!), and it pays around $100k per year. Excel, being the second most popular hard skill, is one of the least compensated on average. Python (pays well!), Tableau and Power BI round out the TOP 5.
It's interesting to see that more advanced concepts and specific tools are associated with higher pay but are rarely mentioned (e.g., airflow, AWS, machine learning).
Both in terms of pay and the number of mentions, we see that soft skills lag behind, which is understandable given the technical nature of Data Analytics. However, among soft skills described in job postings, communication is by far the most popular - representing the bridge between the analytical part of the job and its storytelling aspect.
Finally, let's check if there is a connection between the job location and the number of postings/average salary offered:
%%sql
SELECT location, ROUND(AVG(salary_standardized),1) AS avg_salary, COUNT(*) AS n_of_postings
FROM jobs
GROUP BY location
ORDER BY n_of_postings DESC -- or ORDER BY avg_salary DESC
LIMIT 10;
| location | avg_salary | n_of_postings |
|---|---|---|
| Anywhere | 94046.9 | 10714 |
| United States | 104499.4 | 5889 |
| Kansas City, MO | 83575.8 | 860 |
| Oklahoma City, OK | 87415.0 | 600 |
| Jefferson City, MO | 91201.3 | 507 |
| Tulsa, OK | 91960.7 | 348 |
| Bentonville, AR | 98365.2 | 292 |
| Wichita, KS | 75741.1 | 272 |
| Columbia, MO | 88666.2 | 231 |
| Overland Park, KS | 91905.6 | 213 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()Most of the job postings don't specify a location, which might indicate a prevalence of remote work. As for the job postings with specific locations, it's hard to see any trends here.
I would expect to see a connection between state population and job postings activity (more people - more job opportunities), as well as between GDP by state and average salary offered there. Let's check these hypotheses.
For that, we add the population and GDP data (by US state, from worldpopulationreview.com) to the analysis. We will use GDP data for the last quarter of 2022 to see the differences between states more clearly.
Again, first creating an SQL table and uploading CSV file data:
%%sql
CREATE TABLE IF NOT EXISTS states_gdp (
state VARCHAR(255),
gdpPerCapita2022 DOUBLE,
n VARCHAR(255),
gdpQ42022 DOUBLE,
pop2023 DOUBLE
);
%%sql
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/states_gdp.csv'
INTO TABLE states_gdp
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 ROWS;
Also we will need a converting US state name-abbreviation table (from worldpopulationreview.com), since in our jobs table we have abbreviations:
%%sql
CREATE TABLE IF NOT EXISTS states_abbr (
abbr VARCHAR(255),
state VARCHAR(255)
);
%%sql
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/states_abbr_name.csv'
INTO TABLE states_abbr
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
The last piece of the puzzle is geodata on state coordinates for our visualization (from Google Github repository, see license):
%%sql
CREATE TABLE IF NOT EXISTS states_coord (
abbr VARCHAR(255),
lat DOUBLE,
lon DOUBLE,
state VARCHAR(255)
);
%%sql
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/states_coord.csv'
INTO TABLE states_coord
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
Now we're ready for joining all the tables on the US state name/abbreviation columns. We want to get one table containing: state name with coordinates, population, gdp, number and average salary of job postings.
%%sql --save states
SELECT g.state AS state, c.lat, c.lon, ROUND(AVG(g.pop2023),0) AS population,
g.gdpQ42022*1000000 AS gdp,
ROUND(AVG(j.salary_standardized),1) AS avg_salary, COUNT(*) AS n_of_postings
FROM jobs AS j
INNER JOIN states_abbr AS a
ON RIGHT(j.location, 2) = a.abbr
INNER JOIN states_gdp AS g
ON a.state = g.state
INNER JOIN states_coord AS c
ON g.state = c.state
GROUP BY 1,2,3,5;
| state | lat | lon | population | gdp | avg_salary | n_of_postings |
|---|---|---|---|---|---|---|
| Missouri | 37.964253 | -91.831833 | 6186091.0 | 358840000000.0 | 105743.9 | 2773 |
| Oklahoma | 35.007752 | -97.092877 | 4048375.0 | 226304000000.0 | 94098.2 | 1508 |
| Arkansas | 35.20105 | -91.831833 | 3063152.0 | 162459000000.0 | 96833.8 | 1107 |
| Rhode Island | 41.580095 | -71.477429 | 1090483.0 | 72935000000.0 | 115880.4 | 91 |
| Kansas | 39.011902 | -98.484246 | 2936378.0 | 180925000000.0 | 91736.9 | 1390 |
| California | 36.778261 | -119.417932 | 38915693.0 | 3104993000000.0 | 112115.6 | 114 |
| Massachusetts | 42.407211 | -71.382437 | 6974258.0 | 612246000000.0 | 42820.0 | 40 |
| Texas | 31.968599 | -99.901813 | 30500280.0 | 1919867000000.0 | 98378.0 | 37 |
| District of Columbia | 38.905985 | -77.033418 | 674815.0 | 66093000000.0 | None | 2 |
| Nebraska | 41.492537 | -99.901813 | 1972292.0 | 127422000000.0 | 35796.8 | 1 |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()# save SQL query as CTE:
result1 = %sql SELECT * FROM states
# save CTE as dataframe:
df1 = result1.DataFrame()
The JupySQL message shows us that there are only 14 US states mentioned in our dataset (one of which is District of Columbia). The other postings either have "United States" or "Anywhere" search location, as we saw earlier.
Let's put our data on the US map to examine if there is a connection between the state population and Data Analyst job postings activity there (again - the map is interactive!):
fig = px.scatter_mapbox(df1, lat="lat", lon="lon", hover_name='state',
hover_data={'lat':False, 'lon':False, 'n_of_postings':False,
'gdp':':$,.0f', 'population':':,', 'avg_salary':':$,'},
zoom=3, height=300, size='gdp', color='avg_salary',
labels = {'avg_salary' : 'average salary', 'gdp' : 'GDP'},
title = 'GDP by US State VS Average Salary in Data Analyst Job Postings')
fig.update_layout(mapbox_style="open-street-map", mapbox_center = {"lat": 37.0902, "lon": -95.7129})
fig.update_layout(margin={"r":0,"t":30,"l":0,"b":0})
fig.show()
The size of data points represents GDP by state in Q4 2022, color gradient - average salary, and state population can be seen when hovering over the map.
fig = px.scatter_mapbox(df1, lat="lat", lon="lon", hover_name='state',
hover_data={'lat':False, 'lon':False,
'population':':,', 'n_of_postings':':,', 'avg_salary':':$,'},
zoom=3, height=300, size='population', color='n_of_postings',
labels = {'avg_salary' : 'average salary', 'n_of_postings' : 'job postings'},
title = 'US States Population VS Number of Data Analyst Job Postings')
fig.update_layout(mapbox_style="open-street-map", mapbox_center = {"lat": 37.0902, "lon": -95.7129})
fig.update_layout(margin={"r":0,"t":30,"l":0,"b":0})
fig.show()
The size of data points represents state population, color gradient - the number of job postings, and average salary can be seen when hovering over the map.
We see on the two maps that, despite having a small population, Missouri leads the way in terms of Data Analyst job openings while maintaining a high salary offered. Neighbouring states of Arkansas, Oklahoma, Kansas have also been active hiring data scientists. Big East Coast players such as New York, New Jersey, Pennsylvania tend to be very selective in job postings and do not disclose salary data.
The examples of Califronia, Texas or New York with their high population but only few job postings show that my hypothesis is not confirmed. Clearly, for any significant conclusions we need more data but it is very interesting (and might be beneficial for analysts looking for a job) to see the behavior of local companies when hiring.
For now, let's investigate a little, why Missouri is such an active data analysts "employer", both in number of job postings, and in average salary offered:
%%sql
SELECT company_name, COUNT(*) AS n_of_postings, ROUND(AVG(salary_standardized), 1) AS avg_salary
FROM jobs
WHERE RIGHT(location, 2) = 'MO'
GROUP BY company_name
ORDER BY n_of_postings DESC
LIMIT 10;
| company_name | n_of_postings | avg_salary |
|---|---|---|
| EDWARD JONES | 701 | 123017.9 |
| Corporate | 533 | 58240.0 |
| Walmart | 91 | None |
| State of Missouri | 75 | 57543.2 |
| Commercial Solutions | 53 | None |
| Radancy | 50 | None |
| COMMERCE BANK | 50 | 90000.0 |
| Centene Corporation | 47 | None |
| Government Employees Health Association, Inc. - GEHA | 37 | None |
| Cynet Systems | 35 | None |
ResultSet : to convert to pandas, call .DataFrame() or to polars, call .PolarsDataFrame()It looks like EDWARD JONES, a financial services firm headquartered in St. Louis, Missouri[1], is actively looking for Data Analysts for their many branch offices across the state. It is nice to see a bunch of public office positions open for data scientists as well.
In this project I got a chance to use my MySQL, JupySQL, Python, Plotly skills for a simple but practical analysis. We learned the scope of actual Data Analysts job postings in the United States over the last months, with rankings, job titles and skills analysis, salary data and geodata.
Having background in Compensation and Benefits (HR), it is fascinating for me to see job postings providing real data-driven insights for those who, like myself, are interested in this career.
The End. And, again, check out Luke's channel for more inspiration! :)